Skip to content

[NO REVIEW] Fix 400/1024 error when container is re-created with same name#49131

Closed
xinlian12 wants to merge 3 commits intoAzure:mainfrom
xinlian12:fix/container-recreate-stale-rid
Closed

[NO REVIEW] Fix 400/1024 error when container is re-created with same name#49131
xinlian12 wants to merge 3 commits intoAzure:mainfrom
xinlian12:fix/container-recreate-stale-rid

Conversation

@xinlian12
Copy link
Copy Markdown
Member

Problem

When a container is deleted and re-created with the same name, subsequent read/write/query operations fail with:

com.azure.cosmos.CosmosException: {"innerErrorMessage":"{\"Errors\":[\"Collection rid provided by the user does not match the existing collection.\"]}, StatusCode: BadRequest"}

(HTTP 400, sub-status 1024 / INCORRECT_CONTAINER_RID_SUB_STATUS)

This affects gateway mode because the stale intended-collection-rid header persists through retry.

Root Cause

StaleResourceRetryPolicy correctly handles 400/1024 by refreshing the collection cache and clearing session tokens, but it did not reset the request context before the retry:

  1. request.requestContext.resolvedCollectionRid still held the old collection RID
  2. request.forceNameCacheRefresh was still false
  3. The x-ms-cosmos-intended-collection-rid HTTP header still carried the old RID

On retry, RxGatewayStoreModel.addIntendedCollectionRid() checks if the header is already set and skips updating it — so the stale RID is sent again, causing the same 400/1024 error.

Fix

After refreshing the collection cache in StaleResourceRetryPolicy.shouldRetry(), reset:

  • resolvedCollectionRidnull
  • forceNameCacheRefreshtrue
  • Remove the INTENDED_COLLECTION_RID_HEADER

This ensures the retry re-resolves the collection and sends the correct (new) RID.

Testing

Added parameterized unit test requestContextResetOnRetry covering both error codes:

  • 410/1000 (NAME_CACHE_IS_STALE — direct mode)
  • 400/1024 (INCORRECT_CONTAINER_RID_SUB_STATUS — gateway mode)

The test verifies that after shouldRetry():

  • resolvedCollectionRid is cleared to null
  • forceNameCacheRefresh is set to true
  • INTENDED_COLLECTION_RID_HEADER is removed
  • Old session tokens are cleaned up

Fixes #49097

Reset request context (resolvedCollectionRid, forceNameCacheRefresh,
INTENDED_COLLECTION_RID_HEADER) in StaleResourceRetryPolicy before retry
so the retry re-resolves the collection and sends the correct RID.

Fixes Azure#49097

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 8, 2026 19:35
@xinlian12 xinlian12 requested review from a team and kirankumarkolli as code owners May 8, 2026 19:35
@github-actions github-actions Bot added the Cosmos label May 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses a Cosmos DB retry bug where requests can keep sending a stale x-ms-cosmos-intended-collection-rid header after a container is deleted and re-created with the same name (notably impacting gateway mode and 400/1024 retries). It resets request context state after refreshing the collection cache so the retry re-resolves the container RID and updates headers correctly.

Changes:

  • Reset RxDocumentServiceRequest retry state in StaleResourceRetryPolicy (force name cache refresh, clear resolved collection RID, remove intended-collection-rid header).
  • Add a parameterized unit test validating request-context/header reset behavior for both 410/1000 and 400/1024 scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/StaleResourceRetryPolicy.java Resets request context and intended collection RID header after cache refresh to ensure retry re-resolves the container.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/StaleResourceExceptionRetryPolicyTest.java Adds parameterized unit coverage ensuring request context/header/session cleanup occurs on stale-container retry paths.

// and sends the updated intended-collection-rid header.
if (this.request != null) {
this.request.forceNameCacheRefresh = true;
this.request.requestContext.resolvedCollectionRid = null;
// Reset request context so the retry re-resolves the collection
// and sends the updated intended-collection-rid header.
if (this.request != null) {
this.request.forceNameCacheRefresh = true;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this.request.forceNameCacheRefresh = true;? the collection cache is already refresh above

why not just set the this.request.requestContext.resolvedCollectionRid = refreshedCollectionrid?

xinlian12 and others added 2 commits May 8, 2026 12:43
…eRefresh

- Set resolvedCollectionRid to refreshedCollectionRid instead of null
- Remove unnecessary forceNameCacheRefresh (cache already refreshed)
- Add requestContext null guard to prevent NPE

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove INTENDED_COLLECTION_RID_HEADER in RenameCollectionAwareClientRetryPolicy
(404/1002 READ_SESSION_NOT_AVAILABLE) to prevent stale RID on retry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]Request fail with 400/1024 when container is re-created with same name

2 participants